AITopics | error distribution

While the ImageNet dataset has been driving computer vision research over the past decade, significant label noise and ambiguity have made top-1 accuracy an insufficient measure of further progress. To address this, new label-sets and evaluation protocols have been proposed for ImageNet showing that state-of-the-art models already achieve over 95% accuracy and shifting the focus on investigating why the remaining errors persist.Recent work in this direction employed a panel of experts to manually categorize all remaining classification errors for two selected models. However, this process is time-consuming, prone to inconsistencies, and requires trained experts, making it unsuitable for regular model evaluation thus limiting its utility. To overcome these limitations, we propose the first automated error classification framework, a valuable tool to study how modeling choices affect error distributions. We use our framework to comprehensively evaluate the error distribution of over 900 models. Perhaps surprisingly, we find that across model architectures, scales, and pre-training corpora, top-1 accuracy is a strong predictor for the of all error types.

automated classification, model error, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.56)

Add feedback

Approximate Multiplier Induced Error Propagation in Deep Neural Networks

Alahakoon, A. M. H. H., Saadat, Hassaan, Jayasinghe, Darshana, Parameswaran, Sri

arXiv.org Artificial IntelligenceDec-9-2025

Deep Neural Networks (DNNs) rely heavily on dense arithmetic operations, motivating the use of Approximate Multipliers (AxMs) to reduce energy consumption in hardware accelerators. However, a rigorous mathematical characterization of how AxMs error distributions influence DNN accuracy remains underdeveloped. This work presents an analytical framework that connects the statistical error moments of an AxM to the induced distortion in General Matrix Multiplication (GEMM). Using the Frobenius norm of the resulting error matrix, we derive a closed form expression for practical DNN dimensions that demonstrates the distortion is predominantly governed by the multiplier mean error (bias). To evaluate this model in realistic settings, we incorporate controlled error injection into GEMM and convolution layers and examine its effect on ImageNet scale networks. The predicted distortion correlates strongly with the observed accuracy degradation, and an error configurable AxM case study implemented on an FPGA further confirms the analytical trends. By providing a lightweight alternative to behavioral or hardware level simulations, this framework enables rapid estimation of AxM impact on DNN inference quality.

artificial intelligence, machine learning, multiplier, (16 more...)

arXiv.org Artificial Intelligence

2512.06537

Country:

Oceania > Australia > New South Wales > Sydney (0.05)
Europe (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)

Genre: Research Report (1.00)

Industry: Energy (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

Synthetic Error Injection Fails to Elicit Self-Correction In Language Models

Wu, David X., Kapur, Shreyas, Sahai, Anant, Russell, Stuart

arXiv.org Artificial IntelligenceDec-3-2025

Reinforcement learning has become the dominant paradigm for eliciting reasoning and self-correction capabilities in large language models, but its computational expense motivates exploration of alternatives. Inspired by techniques from autonomous driving and robotics, we investigate whether supervised learning with synthetic error injection can induce self-correction abilities in language models. Our approach inserts artificial errors into reasoning chains, masks them, and supervises the model to recognize and correct these mistakes. Despite the intuitive appeal of this method, we find that it fails to significantly improve performance even on simple synthetic tasks across multiple models. Moreover, even when the model catches its own error, it often parrots the original mistake. We find that the distribution shift of synthetic errors to on-policy errors significantly degrades the error-correction capabilities of the fine-tuned model, even with good synthetic coverage of on-policy errors. Our results help explain why on-policy reinforcement learning methods have proven uniquely effective for eliciting self-correction.

large language model, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2512.02389

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Assessing model error in counterfactual worlds

Howerton, Emily, Lessler, Justin

arXiv.org Artificial IntelligenceDec-2-2025

Counterfactual scenario modeling exercises that ask "what would happen if?" are one of the most common ways we plan for the future. Despite their ubiquity in planning and decision making, scenario projections are rarely evaluated retrospectively. Differences between projections and observations come from two sources: scenario deviation and model miscalibration. We argue the latter is most important for assessing the value of models in decision making, but requires estimating model error in counterfactual worlds. Here we present and contrast three approaches for estimating this error, and demonstrate the benefits and limitations of each in a simulation experiment. We provide recommendations for the estimation of counterfactual error and discuss the components of scenario design that are required to make scenario projections evaluable.

artificial intelligence, modeling & simulation, scenario, (16 more...)

arXiv.org Artificial Intelligence

2512.00836

Country:

North America > United States > North Carolina (0.04)
Europe > Spain (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback

Neural Information Processing SystemsNov-20-2025, 13:52:19 GMT

It is known that the estimation bias hinges heavily on the ensemble size (i.e.,

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Arizona > Maricopa County > Tempe (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Testing-driven Variable Selection in Bayesian Modal Regression

Duan, Jiasong, Zhang, Hongmei, Huang, Xianzheng

arXiv.org Machine LearningOct-29-2025

We propose a Bayesian variable selection method in the framework of modal regression for heavy-tailed responses. An efficient expectation-maximization algorithm is employed to expedite parameter estimation. A test statistic is constructed to exploit the shape of the model error distribution to effectively separate informative covariates from unimportant ones. Through simulations, we demonstrate and evaluate the efficacy of the proposed method in identifying important covariates in the presence of non-Gaussian model errors. Finally, we apply the proposed method to analyze two datasets arising in genetic and epigenetic studies.

artificial intelligence, covariate, machine learning, (18 more...)

arXiv.org Machine Learning

2510.23831

Country:

North America > United States > South Carolina > Richland County > Columbia (0.14)
North America > United States > Tennessee > Shelby County > Memphis (0.04)
Europe > United Kingdom > England > Isle of Wight (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Hematology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.95)

Add feedback

Neural Networks for Censored Expectile Regression Based on Data Augmentation

Cao, Wei, Wang, Shanshan

arXiv.org Machine LearningOct-24-2025

Expectile regression neural networks (ERNNs) are powerful tools for capturing heterogeneity and complex nonlinear structures in data. However, most existing research has primarily focused on fully observed data, with limited attention paid to scenarios involving censored observations. In this paper, we propose a data augmentation-based ERNNs algorithm, termed DAERNN, for modeling heterogeneous censored data. The proposed DAERNN is fully data-driven, requires minimal assumptions, and offers substantial flexibility. Simulation studies and real-data applications demonstrate that DAERNN outperforms existing censored ERNNs methods and achieves predictive performance comparable to models trained on fully observed data. Moreover, the algorithm provides a unified framework for handling various censoring mechanisms without requiring explicit parametric model specification, thereby enhancing its applicability to practical censored data analysis. Introduction Expectile regression (ER), first proposed by Newey and Powell (1987), has been extensively studied as a flexible alternative to quantile regression (QR) for modeling heterogeneous distributions.

artificial intelligence, machine learning, regression, (14 more...)

arXiv.org Machine Learning

2510.20344

Country: